Biostatistics For Dummies (Monika Wahi John Pezzullo)

Log-Rank Test

The log-rank test can be performed using individual-level data, or on data that has been summarized

into a life-table format. In this section, we describe how to run a log-rank test with statistical software,

which is how it is usually done. Next, to help you understand the underlying calculations, we describe

the log-rank test calculations in detail using the life-table as you might carry them out manually using

spreadsheet software such as Microsoft Excel.

Understanding what the log-rank test is doing

A two-group log-rank test asks whether events — which are deaths in our example — are split

between the two groups in the same proportion as the number of at-risk individuals in the two groups.

The computer selects a group and sums the difference between the observed and expected number of

deaths in each time slice over all the time slices to get the total excess deaths for that group. The

excess death sum is then scaled down, meaning it is divided by an estimate of its standard deviation.

(Later in this chapter we describe how to calculate that standard deviation estimate.) The scaled-down

excess deaths sum is a number whose random sampling fluctuations should follow a normal

distribution, and from which a p value can be easily calculated. The null hypothesis of the log-rank test

is that there is no difference in survival between the two groups, so a p value less than your selected α

(usually 0.05) indicates a statistically significant difference.

Don’t worry if the preceding paragraph makes your head spin. It is only meant to give you a general

sense behind the calculations in the log-rank test.

Running the log-rank test on software

Most commercial statistical software packages (like those described in Chapter 4) can

perform a log-rank test. You first organize your data into a table that has one row per individual,

and these three columns:

Group: The group variable contains a code indicating the individual’s group. In this example, we

could use the code Drug = 1 and Control = 2.

Time: A numerical variable containing the individual’s survival time. For individuals

experiencing the event during the study, it represents time to event. For censored individuals, it is

time to the end of observation.

Event status: A variable that indicates the individual’s status at the end of observation. If they got

the event, it is usually coded as 1, and if not or they are censored, it is coded as 0.

To run the log-rank test, you tell your computer program which variable represents the group variable,

which one means time, and which one contains the event status. The program should produce a p value

for the log-rank test. If you set α = 0.05 and the p value is less than that, you reject the null and

conclude that the two groups have statistically significantly different survival curves.

In addition to the p value, the program may output median survival time for each group along with

confidence intervals, and difference in median times between groups. If possible, you will also want